InfoMagic Internet Tools 1995 April

home *** CD-ROM | disk | FTP | other *** search

/ InfoMagic Internet Tools 1995 April / Internet Tools.iso / infoserv / www / cern / dev / www-talk.9301-9306.Z / www-talk.9301-9306 / text0848.txt < prev next >

Wrap

Text File | 1995-04-24 | 1.6 KB | 35 lines

William M. Perry (wmperry@indiana.edu) writes: > Well, right now it would be pretty trivial to modify my emacs browser to >follow _every_ link it finds and record it. Only problem would be in >keeping it from getting in an infinite loop, but that wouldn't be too hard. >Problem would be disk space & CPU time. Unfortunately I don't think infinite loops is the only problem to be solved. For example we have databases of Physics Publications accessable via the web, and cross-referenced for citations. This databases contain ~300,000 entries. A robot, even if it is smart enough to not get into a loop, could spend many days roaming this one database trying to find all the entries. One way around that would be to have a list of places where the robot should not look, but finding this list would itself be a time consuming task. Conversly there are many interesting documents that can only be accessed by giving a keyword, making it difficult for a robot to discover these documents at all. > Once I get the browser stable, I can work on something like this - unless >someone else wants to work on it in the meantime. Might be more >stable/faster if written in C though. :) But then what isn't? > > What type of format would the output have to be in? It would be very >easy to spit out "URL :: TITLE" into a file. If anyone does solve the problems and generate a "URL :: TITLE" list (possibly a few other fields such as last modified date would be useful too) I would be happy to try to make the information available through the database we have interfaced to WWW. Tony Johnson